Please wait...

COVID-19 Health Effects and Social Media

STAT405/605 Final Presentation

Group 1

4/17/2022

Credits

This presentation was created by Ben Allen, Anthony Cai, Rohit Jha, Daniel Prasca, and Ivan Paredes.

COVID-19 case data: Centers for Disease Control and Prevention, COVID-19 Response. COVID-19 Case Surveillance Public Use Data with Geography (version date: April 04, 2022)

COVID-19 tweet data: Gupta, Raj, Vishwanath, Ajay, and Yang, Yinping. COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-11-04. https://doi.org/10.3886/E120321V11

Ancillary Data Credits

Life Expectancy: National Center for Health Statistics. U.S. Life Expectancy at Birth by State and Census Tract - 2010-2015. Date accessed [4/20/22]. Available from https://data.cdc.gov/d/5h56-n989.

Hypertension: Centers for Disease Control and Prevention, Division for Heart Disease and Stroke Prevention

State Population Totals: Population, Population Change, and Estimated Components of Population Change: April 1, 2010 to July 1, 2019 (NST-EST2019-alldata)

Abstract

Our group has decided to translate large COVID-19 data sets into visual illustrations to help depict the larger effects of COVID-19 on our society’s health and its prevalence among social media platforms.

Briefly, our primary dataset includes COVID-19 positive-case counts by age group, sex, race, and geography.

Our secondary dataset includes Twitter user data with tweets that contain COVID-19 related keywords.

Graphs and figures were generated using the R language for graphing and SQLite for large database manipulations.

Primary Data Set Overview

Cumulative US Case Count

Case Count Breakdown By State (1/7)

Case Count Breakdown By State (2/7)

Case Count Breakdown By State (3/7)

Case Count Breakdown By State (4/7)

Case Count Breakdown By State (5/7)

Case Count Breakdown By State (6/7)

Case Count Breakdown By State (7/7)

COVID-19 Cases Over Time, Stacked By State

Case Count Box Plots

Case Count Box Plots 2

Further Breakdown of Case Data

Secondary Data Set Overview

Our next data set contained data on tweets relating to COVID-19. Included is the tweet id, the timestamp, and what country the tweet was from. The data set has over 168 million entries for each tweet recorded relating to COVID-19 internationally.

The authors also made use of AI to generate an emotional rating of each tweet. With this being split up into happiness, sadness, anger, and fear.

Our hypothesis is that the number of tweets relating to COVID-19 could be used to predict the number of COVID cases at a given time.

Secondary Data Set Overview

We can see how “covid” was by far the most common keyword Likely because it is less formal than something like “nCoV”

United States Tweets

Average Emotions of US Tweets

Relationship Between Cases and Tweets

Checking for Linear Relationship

Killer Plot

TRUE
TRUE
TRUE

Plot object

Hypertension and Life Expectancy

We also analyzed secondary datasets that contained information about the Life Expectancy and Hypertension Mortality rates by state.

These factors were chosen because Life Expectancy has been shown in other literature to correlate with overall quality of life and Hypertension mortality rates tend to indicate overall heart health levels in a state.

We hypothesize that these factors may correlate with COVID case numbers on a state level.

COVID-19 and Hypertension (1/2)

COVID-19 and Hypertension (2/2)

P-value = 0.8894 Cor = -0.02084582

COVID-19 and Life Expectancy (1/2)

COVID-19 and Life Expectancy (2/2)

P-value = 0.0456 Cor = -0.2869576

Conclusion

  • The number of tweets is loosely correlated with case count, and while not particularly useful for prediction alone, it may be useful as a feature in a more complex ML model. It should not be used in a linear model given that we were unable to prove a statistically significant linear relationship.

  • Hypertension does not appear to have a relationship with COVID-19 case density

  • There was a statistically significant (α=0.05) inverse linear relationship between average life expectancy per state and COVID-19 case density. This indicates that the factors that lead to a higher average life expectancy also decrease the number of COVID-19 cases in an area.